Data Report — Statlog (Heart)

Cost Matrix

_ abse pres absence 0 1 presence 5 0

where the rows represent the true values and the columns the predicted.

Documentation: Attribute Information:


  -- 1. age       
      -- 2. sex       
      -- 3. chest pain type  (4 values)       
      -- 4. resting blood pressure  
      -- 5. serum cholestoral in mg/dl      
      -- 6. fasting blood sugar > 120 mg/dl       
      -- 7. resting electrocardiographic results  (values 0,1,2) 
      -- 8. maximum heart rate achieved  
      -- 9. exercise induced angina    
      -- 10. oldpeak = ST depression induced by exercise relative to rest   
      -- 11. the slope of the peak exercise ST segment     
      -- 12. number of major vessels (0-3) colored by flourosopy        
      -- 13.  thal: 3 = normal; 6 = fixed defect; 7 = reversable defect
    

Attributes types

Real: 1,4,5,8,10,12 Ordered:11, Binary: 2,6,9 Nominal:7,3,13

Variable to be predicted

Absence (1) or presence (2) of heart disease

Source: UCI dataset 145

SemMap JSON-LD: dataset.semmap.json · RDFa HTML

Overview

Metric Value
Dataset Statlog (Heart)
Source UCI dataset 145
Rows 270
Columns 14
Discrete 9
Continuous 5
SemMap SemMap JSON-LD
SemMap HTML
Missingness Not modeled

Variables and summary

variable inferred dist
age continuous 54.4333 ± 9.1091 [29, 48, 55, 61, 77]
sex discrete 1: 183 (67.78%)
chest-pain discrete 4: 129 (47.78%)
3: 79 (29.26%)
2: 42 (15.56%)
1: 20 (7.41%)
rest-bp continuous 131.3444 ± 17.8616 [94, 120, 130, 140, 200]
serum-chol continuous 249.6593 ± 51.6862 [126, 213, 245, 280, 564]
fasting-blood-sugar discrete 1: 40 (14.81%)
electrocardiographic discrete 2: 137 (50.74%)
0: 131 (48.52%)
1: 2 (0.74%)
max-heart-rate continuous 149.6778 ± 23.1657 [71, 133, 153.5, 166, 202]
angina discrete 1: 89 (32.96%)
oldpeak continuous 1.0500 ± 1.1452 [0, 0, 0.8, 1.6, 6.2]
slope discrete 1: 130 (48.15%)
2: 122 (45.19%)
3: 18 (6.67%)
major-vessels discrete 0: 160 (59.26%)
1: 58 (21.48%)
2: 33 (12.22%)
3: 19 (7.04%)
thal discrete 3: 152 (56.30%)
7: 104 (38.52%)
6: 14 (5.19%)
heart-disease discrete 1: 150 (55.56%)

Fidelity summary

umap model backend disc jsd mean disc jsd median cont ks mean cont w1 mean downstream sign match
metasyn metasyn 0.0355 0.0378 0.1415 2.4032
clg_mi2 pybnesian 0.0293 0.0199 0.1126 3.0859
semi_mi5 pybnesian 0.0293 0.0199 0.1015 2.6707
ctgan_fast synthcity 0.3115 0.237 0.8556 45.6654
tvae_quick synthcity 0.0727 0.0685 0.2156 6.9723

Privacy summary

model backend n real n synth exact overlap rate near duplicate rate eps nn distance mean k min k pct lt5 k map rare qi reproduction rate identifiability score delta presence
metasyn metasyn 270 270 0 0.9741 0.0822 1 1 7 0 2.25
clg_mi2 pybnesian 270 270 0 0.9889 0.0494 1 1 1 0 5
semi_mi5 pybnesian 270 270 0 0.9926 0.0573 1 1 3 0 1.8
ctgan_fast synthcity 270 270 0 0.4222 0.2387 1 1 270 0 0.6111
tvae_quick synthcity 270 270 0 0.9667 0.0652 1 1 1 0 13

Models

UMAPDetailsStructure

Real data

Model: metasyn (metasyn)

Per-variable fidelity
variable type KS W1 JSD
age continuous 0.1037 1.3029
sex discrete 0.0273
chest-pain discrete 0.0508
rest-bp continuous 0.1185 2.6446
serum-chol continuous 0.0593 4.524
fasting-blood-sugar discrete 0.0088
electrocardiographic discrete 0.0487
max-heart-rate continuous 0.1111 3.3496
angina discrete 0.0169
oldpeak continuous 0.3148 0.1948
Downstream metrics
metric value
sign_match_rate
formula heart_disease ~ age + sex + C(chest_pain, levels=[]) + rest_bp + serum_chol + fasting_blood_sugar + C(electrocardiographic, levels=[]) + max_heart_rate + angina + oldpeak + slope + major_vessels + C(thal, levels=[]) + age:sex + sex:C(chest_pain, levels=[]) + C(chest_pain, levels=[]):rest_bp + rest_bp:serum_chol + serum_chol:fasting_blood_sugar
Privacy metrics
metric value
n_real 270
n_synth 270
exact_overlap_rate 0
near_duplicate_rate_eps 0.9741
nn_distance_mean 0.0822
k_min 1
k_pct_lt5 1
k_map 7
rare_qi_reproduction_rate 0
delta_presence 2.25
variable distribution
age core.normal
sex core.multinoulli
chest-pain core.multinoulli
rest-bp core.lognormal
serum-chol core.lognormal
fasting-blood-sugar core.multinoulli
electrocardiographic core.multinoulli
max-heart-rate core.normal
angina core.multinoulli
oldpeak core.truncated_normal
slope core.multinoulli
major-vessels core.multinoulli
thal core.multinoulli
heart-disease core.multinoulli

Model: clg_mi2 (pybnesian)

Per-variable fidelity
variable type KS W1 JSD
age continuous 0.0667 1.0148
sex discrete 0.0233
chest-pain discrete 0.0672
rest-bp continuous 0.1593 3.5411
serum-chol continuous 0.0704 7.5924
fasting-blood-sugar discrete 0.0089
electrocardiographic discrete 0.0166
max-heart-rate continuous 0.0778 3.0444
angina discrete 0.0199
oldpeak continuous 0.1889 0.2368
Privacy metrics
metric value
n_real 270
n_synth 270
exact_overlap_rate 0
near_duplicate_rate_eps 0.9889
nn_distance_mean 0.0494
k_min 1
k_pct_lt5 1
k_map 1
rare_qi_reproduction_rate 0
delta_presence 5

Model: semi_mi5 (pybnesian)

Per-variable fidelity
variable type KS W1 JSD
age continuous 0.0667 0.9959
sex discrete 0.0233
chest-pain discrete 0.0672
rest-bp continuous 0.1519 3.2886
serum-chol continuous 0.0593 6.2695
fasting-blood-sugar discrete 0.0089
electrocardiographic discrete 0.0166
max-heart-rate continuous 0.0667 2.6114
angina discrete 0.0199
oldpeak continuous 0.163 0.1879
Privacy metrics
metric value
n_real 270
n_synth 270
exact_overlap_rate 0
near_duplicate_rate_eps 0.9926
nn_distance_mean 0.0573
k_min 1
k_pct_lt5 1
k_map 3
rare_qi_reproduction_rate 0
delta_presence 1.8

Model: ctgan_fast (synthcity)

Per-variable fidelity
variable type KS W1 JSD
age continuous 0.8037 18.5177
sex discrete 0.4127
chest-pain discrete 0.2243
rest-bp continuous 0.8 47.6668
serum-chol continuous 0.9926 108.7704
fasting-blood-sugar discrete 0.2114
electrocardiographic discrete 0.237
max-heart-rate continuous 0.9963 52.3222
angina discrete 0.2317
oldpeak continuous 0.6852 1.05
Privacy metrics
metric value
n_real 270
n_synth 270
exact_overlap_rate 0
near_duplicate_rate_eps 0.4222
nn_distance_mean 0.2387
k_min 1
k_pct_lt5 1
k_map 270
rare_qi_reproduction_rate 0
delta_presence 0.6111

Model: tvae_quick (synthcity)

Per-variable fidelity
variable type KS W1 JSD
age continuous 0.2259 3.238
sex discrete 0.0518
chest-pain discrete 0.0759
rest-bp continuous 0.2519 6.9693
serum-chol continuous 0.1519 17.1534
fasting-blood-sugar discrete 0.0837
electrocardiographic discrete 0.0685
max-heart-rate continuous 0.1852 7.3049
angina discrete 0.0585
oldpeak continuous 0.263 0.196
Privacy metrics
metric value
n_real 270
n_synth 270
exact_overlap_rate 0
near_duplicate_rate_eps 0.9667
nn_distance_mean 0.0652
k_min 1
k_pct_lt5 1
k_map 1
rare_qi_reproduction_rate 0
delta_presence 13